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ABSTRACT 

Social (or folksonomic) tagging has become a very popular 
way to describe content within Web 2.0 websites. However, 
as tags are informally defined, continually changing, and 
ungoverned, it has often been criticised for lowering, rather 
than increasing, the efficiency of searching. To address this 
issue, a variety of approaches have been proposed that rec- 
ommend users what tags to use, both when labeling and 
when looking for resources. These techniques work well in 
dense folksonomies, but they fail to do so when tag usage 
exhibits a power law distribution, as it often happens in 
real-life folksonomies. To tackle this issue, we propose an 
approach that induces the creation of a dense folksonomy, 
in a fully automatic and transparent way: when users label 
resources, an innovative tag similarity metric is deployed, so 
to enrich the chosen tag set with related tags already present 
in the folksonomy. The proposed metric, which represents 
the core of our approach, is based on the mutual reinforce- 
ment principle. Our experimental evaluation proves that 
the accuracy and coverage of searches guaranteed by our 
metric are higher than those achieved by applying classical 
metrics. 
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1. INTRODUCTION 

Social media applications, such as blogs, multimedia shar- 
ing sites, question and answering systems, wikis and on- 
line forums, are growing at an unprecedented rate and are 
estimated to generate a significant amount of the content 
currently available on the Web. This has exponentially in- 
creased the amount of information that is available to users, 
from videos on sites like YouTube and MySpace, to pic- 
tures on Flickr, music on Last.fm, blogs on Blogger, and 
so on. This content is no longer categorised according to 
pre-defined taxonomies (or ontologies). Rather, a new trend 
called social (or folksonomic) tagging has emerged, and 
quickly become the most popular way to describe content 
within Web 2.0 websites. Unlike taxonomies, which overim- 
pose a hierarchical categorisation of content, folksonomies 
empower end users by enabling them to freely create and 
choose the tags that best describe a piece of information (a 
picture, a document, a blog entry, a video clip, etc.). How- 
ever, this freedom comes at a cost: since tags are informally 
defined, continually changing, and ungoverned, finding con- 
tent of interest has become a main challenge because of the 
number of synonyms, homonyms and polysemies, as well as 
the inevitable heterogeneity of users and the noise they in- 
troduce. 

In order to assist users finding content of their own inter- 
est within this information abundance, new approaches, in- 
spired by traditional recommender systems, have been devel- 
oped [31 1141 113| . These often exploit an underlying tag simi- 
larity measure; whenever a user labels a resource or searches 
for it by adopting a set of tags, they suggest new tags to be 
added to the resource label or to the user query, on the ba- 
sis of their similarity to the original tags expressed by the 
user herself. They do so to increase the chances of finding 
content of relevance in these extremely sparse settings. 

Various classic metrics have been used to compute tag sim- 
ilarity, including, for instance, cosine similarity, Jaccard co- 
efficient, and Pearson Correlation. Some of the approaches 
exploiting these metrics [21 [3] have proved to achieve ex- 
cellent results; however, they do so only if the underlying 
folksonomy is already dense, and they operate by making it 
even denser. Nevertheless, we observe that this assumption 
does not hold true; rather, most real life folksonomies exhibit 
a power law distribution of tag usage Q] [2], with few tags 
labeling most resources, and most tags labeling just a few 
resources instead. This means that, in practical cases, if we 
were select any two tags, the probability that the resources 
jointly labeled by them is non-zero is extremely low. As a 



result, computing tag similarity on real folksonomies, using 
traditional metrics like cosine similarity, would almost al- 
ways yield close-to-zero values, thus failing to support users 
in retrieving resources relevant to their queries. 

In this paper, we propose an approach that transparently 
induces the creation of a dense folksonomy, thus supporting 
the effective retrieval of resources by construction. At the 
core of our approach lies an innovative tag similarity metric, 
used to recommend tags both when labeling resources and 
when querying the folksonomy. This metric is based on the 
mutual reinforcement principle, and thus computed follow- 
ing an iterative algorithm: two tags are deemed similar if 
they label similar resources, and vice-versa, two resources 
are similar if they have been labeled by similar tags. When 
a user labels a new resource, or when she is submitting a 
query to retrieve some resources, the above metric is used 
to automatically expand the user-selected tag set with those 
tags, already present within the folksonomy, that are: (i) 
most similar to those she initially submitted, and (it) among 
those most widely used in the folksonomy. 

We have conducted an extensive experimental evaluation 
on two large-scale datasets, namely BibSonomy and CiteU- 
Like. The obtained results demonstrate that our similar- 
ity metric operates effectively even in very sparse settings, 
where traditional metrics (including cosine similarity, Sim- 
Rank [10] and Latent Semantic Indexing (LSI) [TTJ) fail. 

2. DESCRIPTION OF OUR APPROACH 

In this section we provide a detailed description of our 
approach to support effective resource retrieval in large-scale 
folksonomies. Before illustrating it, we formalize the concept 
of a folksonomy as done in [7] . 

Definition 2.1. LetUS — {iti, . . . ,u„ u } be a set of users, 
RS = {ri , . . . , r„ r } a set of resource URIs, and 
TS = {t\ , . . . , t nt } a set of tags. A folksonomy F is a tuple 
F = (US,RS,TS,AS), where AS C US x RS x TS is a 
ternary relationship called tag assignment set. □ 

In the above definition we do not make any assumption 
about the nature of resources; they could be a URL asso- 
ciated with a Web page (like in Delicious), photos (as in 
Flickr), music files (as in Last.fm), documents (as in CiteU- 
Like), and so on. 

According to Definition 12.11 a folksonomy F is a "three- 
dimensional" data structure whose "dimensions" are repre- 
sented by users, tags and resources. In particular, an el- 
ement a £ AS is a triple {u, r, t) , indicating that user u 
labeled resource r with tag t. To simplify folksonomy mod- 
elling and management, the inherent tripartite graph struc- 
ture is often mapped into three matrices, whereby each ma- 
trix models one relationship at a time [12j . 

In this paper, we adopt the same matrix-based represen- 
tation. Specifically, the association between tags and re- 
sources can be modelled by a nt x n r matrix TR, called 
Tag-Resource matrix, being n t and n r the number of tags 
and resources, respectively. The generic entry of such a ma- 
trix TRij is the number of times the i th tag labels the j th 
resource. In an analogous fashion we can introduce the Tag- 
User and the Resource-User matrices, TU and RU. 

Our approach consists of two phases: the former, exe- 
cuted offline, computes pairwise tag similarities, by means 
of an innovative tag similarity metric. The second phase, 



executed in real-time (online), follows a well-established pro- 
cess: when a user is labeling a new resource, or is querying 
the system to retrieve some resources, the tag set she has 
chosen is automatically expanded using the tags that are 
deemed most related to the user-elected ones, based on the 
similarities previously computed. We now illustrate each 
phase in more detail; we then conclude this section with a 
discussion on how our approach can be efficiently realised in 
practice. 

2.1 Phase 1: Tag Similarity Computation 

As previously pointed out, this phase aims at computing 
pairwise similarity of all tags in use within the folksonomy. 
A variety of metrics have been proposed in the literature, 
mostly based on tag co-occurrence (such as cosine similar- 
ity); however, we claim these approaches fail to work in the 
sparse settings we target. To see why, let us consider cosine 
similarity. Given an arbitrary pair of tags ti and tj, their 
cosine similarity s(ti,tj) would be computed as 

8 (t-f) = - = ^B^rM (i) 

V " j; v/(tr(i),tr(t)>- V(tr(j),tr(i)> 

where t r (i) and t r (j) denote the i th and the j th row of TR. 

Equation[JJstates that the similarity score of a pair of tags 
is high if they jointly co-occur in labeling the same subset 
of resources. One important underlying assumption must 
hold for cosine similarity to work well: matrix TR must be 
densely populated. Unfortunately, this assumption does not 
hold in real folksonomies. 

As an example, let us consider a real-world folksonomy 
like BibSonomy. BibSonomy 6,8 is a social bookmarking 
service in which users are allowed to tag both URLs and sci- 
entific papers. A power law distribution of tags on scientific 
references emerges. In particular, roughly 81% of resources 
were described by no more than 5 different tags (and roughly 
58% by less than 3 ). Furthermore, there is a small portion 
of frequently adopted tags, and a long tail of tags (roughly 
81%) being used less than 5 times overall. Matrix TR is 
thus rather sparse: if we were to select any pair of tags ti 
and tj, most of the components of the corresponding vec- 
tors t r (i) and t r (j) would be 0, and so would be their inner 
product. In other words, the cosine similarity between any 
two folksonomy tags would be very close to 0, regardless of 
what the selected tags are; recommending tags (in Phase 2) 
based on such metric would thus be unfruitful. 

Although classical similarity measures based on 
co-occurrences are inadequate in scenarios characterized by 
power law distributions, other metrics have been proposed 
and successfully used in Information Retrieval which could 
potentially be applied in our domain. We considered, in par- 
ticular, one of the state-of-the-art techniques, namely Latent 
Semantic Indexing (LSI) 11 . LSI approximates the matrix 
TR by computing its top k eigenvalues. This is equiva- 
lent to mapping TR onto a low-dimensional vectorial space, 
with k dimensions (called Latent Space). Similarities be- 
tween tags (respectively, resources) are computed in the La- 
tent Space by applying the cosine similarity. Unfortunately, 
the application of this technique raised several concerns, be- 
cause: (i) the computation of LSI on large matrices is very 
costly (and could indeed be practically unfeasible in real 
folksonomies) ; (ti) the tuning of parameter k is complex and 
time-expensive, and the quality of the produced results is 
very sensitive to such value. 



More suitable to the folksonomy domain are techniques 
that rely on the mutual reinforcement principle. One of 
the most popular techniques based on it is SimRank |10| . 
SimRank uses an iterative approach to compute similarities 
whereby, in each iteration, the similarity between any two 
objects (be them tags or resources) is computed, based on 
the similarities already computed in the previous iteration. 

If we were to adopt SimRank, the equations used at the 
k th iteration would be 
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where: (i) st k (t a ,t b ) (resp, sr k (r a , r b )) denotes the similar- 
ity between t a and t b (resp, r a and r b ) at the k th iteration; 
(ii) the set r(t a ) (resp., t(r a )) is the set of resources (resp, 
tags) associated with t a (resp., r a )\ (Hi) Ci and Ci are two 
normalization constants belonging to the real interval [0, 1]. 

Equations[2H3]suffer from some main drawbacks that limit 
their applicability in our setting: 

• SimRank does not take into account the number of 
times a tag intervenes in labeling a resource, thus dis- 
carding valuable information available within the folk- 
sonomy. 

• SimRank does not distinguish between tags that have 
labeled exactly the same resource, and tags that (by 
means of one or more iterative steps) have been label- 
ing related, but different, resources. 

Although we share with SimRank the idea of computing 
tag/resource similarity by means of an application of the 
mutual reinforcement principle, we advocate for some main 
changes, in line with the discussion above. To begin with, 
the frequency with which a tag intervenes in labeling a re- 
source is a very important piece of information that should 
be leveraged in the similarity computation process. Further- 
more, a new factor (which we will call mutual reinforcement 
factor) should be introduced, to give more relevance to tags 
that labeled the very same resources, with respect to those 
that labeled related (but not the very same) resources. 

We have thus derived a novel similarity metric, specifically 
conceived for the folksonomy setting. In detail, the simi- 
larity computation is performed recursively. For the base 
case, given a pair of tags (t a ,t(>) and a pair of resources 
(r a ,Tb), the tag similarity st°(t a ,tb) and the resource simi- 
larity sr°(r a ,rb) is denned as follows 

st°(t a ,t b ) = Sab sr°(r a ,r b ) = 5 ab (4) 

Equation [4] states that, in the initial step, each tag (resp., 
resource) is similar only to itself and it is dissimilar to all 
other tags (resp., resources). 

At the k th step, let st k ~ 1 (t a ,t b ) (resp., sr k ~ 1 (r a , r b )) be 
the tag (resp., resource) similarity between t a and t b (resp., 
r a and r b ). The following rules can be applied to compute 
st k (t a ,t b ) (resp., sr k (r a ,r b )) 

ST k (t a ,t b ) 



st (t a ,t b ) 



sr (r a ,r b ) = 



y/ST"(t a ,t a )- y/ST"{t b ,t b ) 

SR k {r a ,r b ) 

y/SR k (r a ,r a )- v?SR k {r b ,r b ) 



(5) 
(6) 



ST k (t a ,t b ) = ^ TR„ • * 8j • sr h - x {r urj ) ■ TR 6j (7) 

i,i=i 
nt 

SR k (r ai r b ) = TR,„ • ■ st k - 1 (t l ,t J ) ■ TR j6 (8) 

i,j=l 

Here tyij is equal to 1 if i = j, while it is equal to ip if 
i ^ j. ip is what we call mutual reinforcement factor, and is 
a value belonging to the real interval [0, 1]. 

Equations [5H6] rely on the following intuitions. Given a 
pair of tags {t a , t b ), at the k iteration, we consider all pairs 
of resources (rj, r 3 ) in the folksonomy and we take their sim- 
ilarity sr fc_1 (n, rj) into account to compute st k (t a ,t b ). In 
particular, we compute a weighted sum of all the similarity 
values sr k ~ 1 (ri,rj), where the weights reflect the strength 
of the association between the tag t a and the resource r», 
and the tag t b and the resource fj. As a consequence, the 
higher the similarity between rt and Tj, the higher the con- 
tribution of the association between t a and n, as well as t b 
and rj. Finally, the mutual reinforcement factor ip is in- 
strumental to give higher relevance to tags that labeled the 
very same resources, (resp., to resources labeled by the very 
same tags): indeed, the higher ip, the higher the relevance 
assigned to similar resources (resp., tags) in the tag (resp., 
resource) similarity computation. 

We argue that Equations [SHS] are able to effectively ad- 
dress the power law challenge we outlined above. In fact, 
when computing tag (resp., resource) similarity, our mea- 
sure leverages the similarity of all pairs of resources (resp., 
tags) in the folksonomy. While cosine similarity restricts 
its attention to those resources jointly labeled by two tags 
(resp., those tags jointly labeling two resources), which are 
usually very few, our metric iteratively propagates similar- 
ity scores by considering all the pairs of similar resources 
jointly labeled by the two tags (resp., all the pairs of similar 
tags jointly labeling two resources). In this way, our mea- 
sure can be applied in settings characterized by power law 
distributions of tag usage. 

Similarly to state-of-the-practice recommender systems, 
we expect the above tag similarity computation to be per- 
formed offline, at regular intervals of time (e.g., daily, 
weekly), depending on the growth of the system and the 
available computational resources. With these similarities 
pre-computed, we now proceed to discuss how tag recom- 
mendation and expansion is performed, both when labeling 
new resources and when querying the folksonomy. 

2.2 Phase 2: Tag Expansion 

Key to our approach is the use of the previously computed 
tag similarities to automatically expand the tag set chosen 
by the user, both when labeling a new resource and/or when 
querying the folksonomy. Note that, by performing tag ex- 
pansion upon adding a new resource in the system, we im- 
plicitly induce the creation of a denser folksonomy; further- 
more, by performing tag expansion upon querying the sys- 
tem using the very same approach, we implicitly induce the 
community to use a common vocabulary. Taken together, 
these tag expansions have the effect of providing more accu- 
rate answers to users' searches within large-scale folksonomy. 

The approach can be summarised as follow. Let tSet = 
{ti, . . . ,t n } be the set of user-selected tags, either to label 
a resource or to submit a query in the folksonomy. Let tj 



be a tag in tSet and t, a tag not in tSet. We assign a score 
sc(ti,tj) to ij with respect to tj based on: (i) st(ti,tj) - 
the similarity between ti and tj as previously computed; (it) 
count(ti) - the number of times ti appears in the folksonomy; 
(Hi) IRF(ti) - the inverse resource frequency of tj (similar 
to IDF in Information Retrieval). This is a measure of the 
general importance of ti within the whole folksonomy, and it 
is obtained by dividing the total number of resources in the 
folksonomy by the number of resources labeled by ti, and 
then taking the logarithm of the quotient. 

More precisely, sc(ti,tj) is computed as follows 



sc{ti,tj) = st(U,tj) ■ log count(U) ■ IRF(U) 



(9) 



Equation assigns high scores to those tags that are both 
similar to tj £ tSet (as per our similarity metric) and, cru- 
cially, which are both largely used (count(U)) and impor- 
tant (IRF(ti)) in the overall folksonomy. Intuitively, our 
approach expands user-selected tags with related tags that 
are part of the emerging common vocabulary of widely used 
tags. Note that we compute the logarithm of count(ti) to 
give equal weight to frequently used tags and to important 
ones (as computed by IRF(ti), which, by definition, already 
computes the logarithm). 

Finally, the total score SC of ti with respect to tSet is 
obtained by summing the scores of ti with respect to all the 
tags of tSet 



SC(ti,tSet)= sc (Mj) 



(10) 



Although in this paper we will be evaluating a fully auto- 
matic approach, whereby the user-selected set of tags tSet is 
transparently expanded with the k highest scoring tags ac- 
cording to Equation 1101 a more interactive approach could 
be adopted, whereby users are suggested up to k expansion 
tags, and they can decide which ones, if any, to use. Such an 
approach may lead to even more accurate results than those 
we will report in Section [3] and it is thus worth exploring in 
the future, by means of controlled user studies. 

2.3 Taming Computational Complexity 

The practical usability of our approach is strictly linked 
to the computational complexity of Equations [SHU In par- 
ticular: 

• From a theoretical standpoint, the computation of each 
pairwise tag similarity may require an infinite number 
of iterations. As a consequence, a stopping criterium 
is required so that the execution of Equations [5}{6] ter- 
minates after a finite (and hopefully low) number of 
iterations. 

• Equation [5] (resp., Equation [6| requires the computa- 
tion of resource-resource (resp,. n\ tag-tag) similar- 
ities, at each k th step. This could make our similarity 
measure inapplicable in practical cases, because each 
iteration would require exactly n\ x n\ computations. 

However, we can prove that these theoretical limits do not 
apply in practice, and that, in fact, our new tag similarity 
metric requires a computational complexity comparable to 
that of cosine similarity. First of all, convergence of Equa- 
tions [5H6] has been demonstrated. 



Theorem 1. Let st k (t a ,tb) and sr k (r a ,r b ) be defined as 
in Equations Given any pair of tags t a and tt, and 

any pair of resources r a andrb, the sequences st k (t a ,tb) and 
sr k (r a ,rt) converge. □ 

The proof of the above theorem is available in full 
at http://tinyurl.com/proof-cikm2011 and is based on the 
demonstration that the sequences st k (t a ,tb) and sr k (r a ,rb) 
are both bounded and not-decreasing. To complement this 
theoretical result, our experiments on two real folksonomies 
(Section 13. 3|) will provide evidence of very fast convergence 
indeed. 

The second important result is that Equations 0-[6] can 
be defined, without any loss of generality, as simple matrix 
products (such as in cosine similarity). Specifically, let st k 
and sr fe be the tag-tag and resource-resource similarity ma- 
trices, respectively, with st° = It and sr° = I r , where It 
(resp., I r ) is the n t x n t (resp., n r x n r ) identity matrix. We 
use symbol "o" to refer to the Hadamard matrix product [5]- 
At the k th step the st fe and sr fe matrices are computed as 



st fc = ST fc o DT fc 



and sr 



where 



ST fc 
SR fe 



TR x (* r o sr fc - 
: TR* xff t o st k 



SR fe o DR fc 



x TR* 
) x TR 



DT^ 



- — DRn), ; r ; 



(11) 

(12) 
(13) 
(14) 



In the above equations, (resp., ^/t) refers to a square 
matrix n r x n r (resp., n t x nt) where all the elements are 
set equal to the mutual reinforcement factor ip, with the 
exception of the diagonal, where the elements are set to 1; 
the symbol TR* represents the transpose of TR. Each step 
of the tag similarity computation can thus be performed by 
means of a simple matrix product. This result, coupled with 
the empirical observation that only a few iterative steps are 
required to reach convergence (Section 13. 3|l . makes our sim- 
ilarity metric suitable in practical contexts. This conclusion 
is even more valid if we consider that, in our approach, tag 
similarity computations are performed offline. 

3. EVALUATION 

In order to evaluate the performance of our approach, we 
built a prototype in Java and MySQL and we performed 
experiments using two well known social tagging websites, 
i.e., Bibsonomy and CiteULike. The experiments we carried 
out aimed at answering the following questions: (%) Is our 
approach able to increase the accuracy of searches? And, 
if so, to what extent does the improvement depend on the 
underlying similarity metric in use? (ii) Does our approach 
scale to large folksonomies? 

After presenting the two datasets used for experimenta- 
tion, we will address each of the above questions in turn. 

3.1 Datasets 

We performed experiments on the following two datasets 
extracted from real large-scale folksonomies. 



Bibsonomy (http://www.bibsonomy.org/) is a social 
bookmarking website promoting the sharing of both 
scientific references and general URLs. We downloaded 
a snapshot of this Web site in June 2009, containing 



648,924 bookmark^ and 4,696 users who had tagged 
578,587 scientific references overall, using 147,076 dis- 
tinct tags. 

• CiteULike (http://www.citeulike.org/) is a social 
bookmarking website that aims at promoting and de- 
veloping the sharing of scientific references amongst 
researchers. We downloaded a snapshot dataset with 
57,053 users, 1,928,302 papers, 401,620 different tags, 
and 2,281,609 bookmarks. 

3.2 Accuracy of User Searches 

3.2.1 Simulation Setup 

The first experiment we conducted aimed at determining 
the ability of our approach to retrieve resources of relevance 
to the user querying the folksonomy. This experiment was 
enacted as follow: we split each dataset into two different 
ones, called train set and test set; the split was performed 
multiple times at random, with the former containing 90% 
of bookmarks, and the latter containing the remaining 10%. 
After this split, we considered two different versions of the 
involved datasets, which we will refer to as original (un- 
modified) ones and enriched ones. The enriched version f + 
of each dataset / was obtained as follows. We computed sim- 
ilarities between all pairs of tags in the train set. After this, 
we examined all bookmarks in the original /; each bookmark 
(u, r, tSet) was then enriched by adding to tSet the k tags 
that our approach would recommend, as per Equation 1101 
In this experiment, k was set equal to \0.5 -tSet] litSet > 6, 
equal to 3 otherwise. 

In so doing, we simulate the automatic expansion of user- 
selected tags, as it would happen when labeling resources. 
The folksonomy / would thus be induced to grow to the en- 
riched / + , an emerging folksonomy containing a more com- 
mon and widely accepted vocabulary. 

Having performed this preparatory step, we considered 
/ and, for each bookmark (u,r,tSet) in the test set, we 
used tSet to query the train set and retrieve the q resources 
most relevant to the query. Note that, according to our 
approach, the query tag set tSet was first expanded with 
the k most related tags, k set as above. To determine the 
relevance of a resource to a query tSet, we computed the 
TF-IDF coefficient assigned to such resource for each query 
tag tj £ tSet, then summed up these values. The q most 
relevant resources were then offered to the user as answer 
to her query. We have experimented with q £ {5, 10, 20}, 
and measured the percentage of times the searched resource 
r appeared in the top q retrieved resources. We call this 
measure retrieved ratio. We repeated the same procedure 
for the folksonomy / + to see if the corresponding retrieved 
ratio increased (or decreased) with respect to the one of /. 

The above process follows the intuition that, if a user la- 
beled a specific resource r with the set of tags tSet, then tSet 
would very likely be the set of tags such user would employ 
to query the folksonomy when willing to retrieve r. However, 
due to the number of synonyms, homonyms and polysemies, 
as well as the heterogeneity of users, r could have been de- 
scribed by users with different tags. This implies that tSet 
could be unsuitable to retrieve r per se, and thus tag expan- 
sion, performed both when labeling and when querying the 

1 In this context, a bookmark is defined as a triplet 
{u, r,tSet), where tSet is the set of tags originally assigned 
by the user u to label the resource r. 
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Figure 1: Retrieved Ratio on Bibsonomy and Ci- 
teULike 

system (i.e., / + ), should yield better results (i.e., increased 
retrieved ratio). 

We repeated the described process 10 times over different 
train and test random splits of the datasets. The results we 
describe next represent the average values of these runs. 

3.2.2 Results 

Figure [T] shows how the retrieved ratio of our approach 
varies across the two datasets, for different values of ip, with 
respect to the case where no enrichment was performed. The 
same figure also illustrates a comparison of our approach 
with respect to cases where cosine similarity, Latent Seman- 
tic Indexing, and SimRank were used as the underlying tag 
similarity metric. The following two main observations can 
be drawn: 

• The lowest retrieved ratio is the one obtained when 
no enrichment is performed. In particular, the re- 
trieved ratio is up to 70% better when using an en- 
riched folksonomy than when used an unmodified one. 
This means that, by transparently enriching the folk- 
sonomy when labeling resources, we successfully in- 
duce the construction of a denser and more meaning- 
ful folksonomy, over which resource retrieval performs 
better overall. This result underlines the importance 
of supporting users in their tagging and querying ac- 
tivity. 

• Within the enriched folksonomy, the approach based 
on our novel tag similarity metric outperformed all 
others, followed by LSI, while SimRank and cosine 
lag behind. More precisely, results obtained by ap- 
plying our similarity metric are up to 50% better than 
those obtained by applying cosine similarity or Sim- 
Rank, and up to 8% better than those obtained by 
applying LSI. 

3.3 Scalability 

3.3.1 Simulation Setup 

As previously pointed out, the highest cost caused by our 
approach lies in the computation of pairwise tag similarities. 
In Section 12.11 we have shown that our metric is recursive 
and that each step is no more expensive than the computa- 
tion of classical similarity measures, such as cosine similar- 
ity. We have also proved that our formulation is convergent 
(Theorem [TJ . However, it is necessary to investigate how 
many steps are necessary to reach convergence in practice. 
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Figure 2: Scalability of our Approach on Bibsonomy 
and CiteULike 



To experimentally perform this computation, we have de- 
fined the following parameters 



st* 



St* 



St* 



and 



(15) 



Here, st fc (resp., sr fc ) are the tag-tag (resp., resource- 
resource) similarity matrices at the k th step (see Equationllip , 
whereas symbol || • ||i indicates the 1-norm of a matrix. 

3.3.2 Results 

Figure [2] plots the variation of St and <5£ as k increases, for 
each of our datasets. As shown, in the practical settings we 
have experimented with, the computation of our similarity 
metric exhibits very fast convergence. As an example, across 
all considered datasets, 5t and 5^ are less than 0.1 after just 
six iterations. In other words, in the datasets we used, by 
accepting a negligible error in the similarity computation, 
we can stop our iterative procedure in less than six itera- 
tions. Using a server equipped with a quad-core processor 
and 32GB of RAM (which is much smaller than any server 
deployed in practice by actual businesses) , we computed all 
similarity measures across all two datasets in less than 48 
hours. As the computation of pairwise tag similarity is per- 
formed periodically (e.g., weekly) offline, this result confirms 
that our similarity measure is scalable and well suited to be 
applied even when operating in large folksonomies. 

4. CONCLUSIONS 

In this paper, we have proposed an approach that enables 
the effective retrieval of resources within folksonomies. The 
approach relies on an innovative tag similarity metric that 
is based on the mutual reinforcement principle. This met- 
ric is used both when users label resources, so to automati- 
cally enrich the user-selected tag set with highly-related tags 
already present in the folksonomy, and when users query 
the folksonomy. Our experimental evaluation has demon- 
strated that the accuracy of searches entailed by our metric 
are neatly higher than those achieved by applying classical 
metrics, thus confirming its suitability in scenarios charac- 
terized by power-law distributions of tags (as is the case in 
many real world folksonomies). Finally, the computational 
cost of our iterative approach is limited, as convergence is 
guaranteed, and in practice reached after a handful of iter- 
ations. 
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